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Abstract 

We present an alternative approach to the Bayesian nonparametric analysis of conditional 
species richness under two-parameter Poisson Dirichlet priors. We rely on a known characteri- 
zation by deletion of classes property and on results for Beta-Binomial distributions. Besides 
leading to simplified and much more direct proofs, our proposal provides a new scale mixture 
representation of the conditional asymptotic law. 

1 Introduction 

In Favaro et al. (2009) explicit expressions for Bayesian nonparametric estimators for conditional 
species richness under two-parameter Poisson-Dirichlet priors have been derived to deal with the 
problem of prediction when the size of the additional sample tends to be very large. The paper also 
investigates the asymptotic behavior of this quantity in order to obtain asymptotic highest poste- 
rior density intervals for the estimates of interest. Despite referring to the Bayesian nonparametric 
treatment of conditional Gibbs structures as introduced in Lijoi et al. (2007, 2008), the proofs are 
somehow cumbersome and do not resort to previously established properties of these structures nor 
to some specific available results for the two-parameter Poisson-Dirichlet family. 

Here we show how the results in Favaro et al. (2009) may be derived by a much more direct 
and simpler approach, resorting to the deletion of classes property of the two-parameter Poisson- 
Dirichlet model (Pitman, 2003) and to known properties of the Beta-Binomial distribution. More- 
over, as a by product, we obtain a new scale mixture representation for the limit law of the condi- 
tional species richness which differs from that derived in Favaro et al. (2009). 



Notice that, to make the paper easily readable to those unfamiliar with the Bayesian treatment 
of exchangeable Gibbs partitions, we adopt Pitman's (2006) notation. A preliminary rephrasing of 
the Lijoi et al. (2007, 2008) approach in terms of Pitman's theory may be found in Cerquetti (2008), 
where even the relationship between conditional Gibbs structures and the operation of deletion of 
classes has been first pointed out. For the sake of clarity and to make the paper self-contained, we 
open each section with known results of which we will make use throughout the paper. 

* AMS (2000) subject classification. Primary: 60G58. Secondary: 60G09. 
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2 Some preliminaries on the two-parameter Poisson-Dirichlet 
partition model 

The two-parameter (a, 6) Poisson-Dirichlet distribution for a 6 [0,1) and 9 > —a is a model 
for random partitions (Pitman and Yor, 1997), which belongs to the class of exchangeable Gibbs 
partitions of type a g [-co, 1) as defined in Gncdin and Pitman (2006). This class is characterized 
by an exchangeable partition probability function of the form 

k 

p(m, ...,n k ) = V n>k J|(l - a)™ 3 -i, 
i=i 

where (a)b = a(a + 1) • • • (a + b — 1) are rising factorials, with weights V n ,k satisfying the backward 
recursion V n ^ = (n — ka)V n+ \ t k + V n+ \ t k+i- The (a, 0) Poisson-Dirichlet distribution is well-known 
to arise for 

_ (g + a)fc-i t a m 
n ' fc ~ (6» + l)„_i ' UJ 

where (x) s ^ a stands for generalized rising factorials (x) s f a — (x)(x + a)(x + 2a) ■ ■ ■ (x + (s — l)a). 
This model has been largely studied in the last twenty years (see e.g. Perman et al. 1992, Pitman, 
1995, 1996a, 1996b, Pitman, 2003) and a lot of results are available for it. Here we just recall few 
of them that we are going to exploit in the following. A general reference is Pitman (2006). 

For S~£~ a the generalized Stirling numbers of the first kind (see e.g. Hsu & Shiue, 1998), the 
law of the number of blocks K n observed in an n-sample for k £ {1, . . . , n} is given by 



n (g + QQfc-itg g-l-a 



with expected value equal to 



^ a ,e(K n ) = — — — -. (3) 

a{U + l) n -i a 



A general expression for the moments of any order of K n has been obtained in Yamato and Sibuya 



(2000) and Pitman (1996b) in terms of non-central Stirling numbers of the second kind S®' 1 '' 
and corresponds to 



Ea,*TO = E(-D r - J Wa + l h S ^ 6 + j « + l)n -\ (4) 

j=0 L )n-1 



Now, conditional Gibbs structures have been introduced as tools for a Bayesian nonparametric 
approach to species sampling problems in Lijoi et al. (2007, 2008). In this setting, given a sample 
(X\, . . . ,X n ), with (m, . . . , rife) the vector of observed multiplicities of each species represented, 
interest typically lies in the law of the number K m of different species observed in an additional 
m-sample {X n+ll . . . , X n+m ). The general form of this distribution have been first obtained in Lijoi 
et. al. (2007, cfr. Proposition 1.) by combinatorial arguments, and may be expressed in terms 
of non-central generalized Stirling numbers of the first kind (cfr. Cerquetti, 2008, cfr. Eq. 32) as 
follows 

P(K m =k*\n u ...,n k ) = P(K m = k*\K n = k)= V »+™W g -l,-a,-(n-fca) ) (g) 

Vn,k 
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for k* E {0, 1, ... , m}. This formula specializes under the (a, 9) Poisson-Dirichlet model as 



F a<e (K m = k*\K n = k)= ( ^tf ) r ta g;^ a '" Ma) , (6) 

+ 7l) r 



whose expected value, given by 



TIT lis \1S 1\ \^ ; * ^ + fc«)fe*ta c -l,-a,-(n-ka) 

fc » =0 l» + nj m 
plays the role of a Bayesian estimator for K m . 

3 Conditional analysis for species richness under two-parameter 
Poisson-Dirichlet priors 

Favaro et al. (2009) move from the need of an alternative expression for ([7]) to reduce the computa- 
tional effort needed to calculate both ([7]) and Bayesian estimators for related quantities of interest 
in species sampling problems. These basically sum up to the discovery probability, the probability 
to discover a new species at the (n + m + l)th draw without observing the m intermediate records, 
and the sample coverage, the proportion of species represented in a sample of given size featuring 
a certain number of distinct species. 

Here is our approach to the problem. Let S m be the number of observations in the additional 
m-sample belonging to new species, with values in {0, . . . ,m}. By the basic rules of conditional 
probability we can always write ([5]) as 

ra 

P(K m = k*\K n = k) = J2 ¥ ( K m = k\ S m = s\K n = k) = 

m 

= P (K m = k*\K n = k, S m = s)F(S m = s\K n = k). (8) 

The general form of P(5 m = s\K n = k) for Gibbs partitions of type a € [0, 1) has been derived 
in Lijoi et al. (2008, cfr. Eq. (11)), and expressed in terms of generalized Stirling numbers as in 
Cerquetti (2008) is given by 

P(5 m = s\K n = k) = — — ( m J (n - ka) m -s V n+m , k+k . S~^~ a . (9) 

Vn,k\Sj fc . =0 

This formula specializes under the (a, 6*)-Poisson-Dirichlet model as follows. First notice that 

Vn-\-TTl,k _ 1 

V n , k = (0 + n) ro ' 

then, by means of the multiplicative property of generalized rising factorials and the definition of 
generalized Stirling numbers as connection coefficients, the sum in ([9]) reduces to 

1 fa i N c-l.-a (9 + a )k-l^a (a , \ „-i a 

"-' a) fc+fc ._i tQ1 S fc » = —— > [p + ka)k*t a b k l = 



+ J-M+m-l fc 7Z {V + JJn+m-1 fc " Q 
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It follows that © specializes under the (a, 9) model as 



P a ,o(S m = s\K n = k)= ( m ) (n - ka) m - s n+rn - k [Q + fc Q ) s 




/ to\ (n - fcg) m _s(fl + fca) s 
W (0 + n)m 



(10) 




Remark 1. In Lijoi et al. (2008) an analogous derivation of (flOl) in terms of generalized facto- 
rial coefficients (see Charalambides, 2005) is in Example 3.2. Nevertheless the relationship with 
Beta-Binomial distributions is not highlighted, (e.g. in deriving the expected value they resort to a 
general formula in Proposition 2.). Notice that Beta-Binomial distributions (see e.g. Johnson and 
Kotz, 1977, 2005) can be seen as a generalization to non-integer parameters of Polya urn distribu- 
tions of parameters (a, 6, c = 1), for a and b the initial composition of the urn and c the number of 
balls of the same color replaced in the urn with the ball observed. Asymptotic results for Polya dis- 
tributions extend to Beta-Binomial models, something that we will exploit in the following sections. 

As for f{K m = k*\K n = k, S m = s) this is the law of the number of blocks for a conditional 
Gibbs structure as defined in Lijoi et al. (2008, Prop. 3). As shown in Cerquetti (2008, Section 
4.1), the operation of conditioning to the number s of observations in the new blocks is equivalent 
to conditioning to the number to — s of observations in old blocks, i.e. to the vector (mi, . . . , m^) 
and corresponds to the operation of deletion of the first k classes as defined in Pitman (2003). 

Definition 2 [Deletion of classes, Pitman (2003)] Given a random partition IT of N, the oper- 
ator deletion of the first k classes is as follows: First let II£ be the restriction of II to Hk := 
N — G\ — ■ ■ ■ — Gk where G%, . . . , Gj. are the first k classes of II in order of their least elements, then 
derive ITfc on N from II£ on Hk by renumbering the points of Hk in increasing order. 

Pitman (2003) shows that this operation characterizes the two-parameter Poisson Dirichlet family 
of distributions for a € (0, 1) in that produces a Gibbs partition still belonging to the Poisson- 
Dirichlet class with updated parameter (a, 9 + ka) (see Gnedin et al. 2009 for recent results and a 
comprehensive treatment of the topic). It follow that, by © 



P a .e(K m = k*\K n = k, S m = s) = P a . e+ UK S = k*) = { ± , a ' k '-^ a (n) 

W + Ka + 1 ) s _i 




S. 



1—1, a 
s,k* 



(0 + ka) a 



,fc*-i 



- 1 T(9/a + k + k*) 



£(7) 



(n - ka) m - s S s k , 



— a 



(6 + n) 



T{9/a + k + 1) 



and by the definition of non-central generalized Stirling number as connection coefficients yields 

_ (0 + fcoQfc'ta q-l,-a,-(n-ka) 

(9 + n) m s > k ' 

Now we show how the approach described in the present section applies to the study of expected 
value, moments and the asymptotic behaviour of (K m \K n = k) under (a, 9) Poisson-Dirichlet 
model. 

3.1 Moments 

By the mixture representation (|SJ , and the simplification induced by the deletion of classes property, 
the moments of any order for the number of species in the additional sample conditional on the 
basic sample are given by: 

m 

E a< g(K^\K n = k) = J2 E <*A K m\Sm = s,K n = k)¥ afi {S m = s\K n = k). (12) 

s=0 

For r — 1 deriving E ai g(K^\S m — s,K n = k) is just a matter of specializing (J3J, 

E a , e (K m \S m = s,K n = k)=E afi+ka {K s ) = % + + " )fl - (13) 

a{9 + ka + l)s-i a 

For r > 1 specializing (jU) yields 

E Qi e(i^JS m = s,K n = k) = 

k (irr\ ST-/ i\r-j ( + kot+a \ ,i,(e+fca)/q + ka + J a + nyl \ 
^ V « Jj (9 + ka + l)^ 



We are now in a position to prove Favaro et al. (2009) Proposition 1. in a much more direct 
and simple fashion. 

Proposition 3. Under the (a, 9) Poisson-Dirichlet model an explicit expression for the expected 
value of K m conditioned to the number of blocks K n observed in the basic n-sample is as follows 



E a<e (K m \K n = k) 
Proof: Bv ([TP]). ([12]) and fity 



9 + ka 



a 



) + a + n) r 
(9 + n) m 



1 



l a ,e(K m \K n = k) = ^E a ^ +ka (K s )¥ afi (S m = s\K n = k) 



s=0 



—y 



+ ka + a) s 9 + ka 



s=0 



a(9 + ka + l) s -i a 
9 + ka ( (9 + ka + a) s 



a \ (9 + ka) s 



(n - ka) m ^ s (9 + ka) s _ 
(0 + n) m 

9 + ka) s (n — ka) T 



(15) 



■5 



9 + ka 



+ n), 



E 



s=0 

9 + ka 
a 



+ ka + a) s (n — ka), 
(9 + a + n) m 



' + n) r 



+ n) r 



a 



Proposition 4. Under the (a, 9) Poisson-Dirichlet model an explicit expression for the moments 
of any order for [K m \K n = k) is given by 



E a , e (K r m \K n = k) = £(-1) 



9 + ka \ g o,i,(e+fea)/q (9 + n + ja) r 

b r, 3 



• + n) r 



(16) 



Proof. By (do]), ((12]) and dUl) 



E a<e (K^\K n = k) 



EOEK 

3=0 



r-j 



9 + ka + a 



a 



■,0,1,(0+ 



ka) /a (9 + ka + ja + l) s _i (9 + ka) s {n - ka) m -s 



l + ka) 



Ec- 1 : 



+ A:a + l) s _i 

E "" x 



(fl + n) 



1 + ka + „o,i:(e+fca)/« ( m \ i° + ka + ja) s (n - ka) r 



s=0 



' + ka + ja) 



J=0 



+ 71 + ja) T 

(5 + n) TO 



□ 



Remark 5. As from the name Beta-Binomial distributions arise as Beta mixtures of a Bino- 
mial models, i.e. are models for the number of success in a sequence of independent trials once the 
probability of success has been randomized according to a Beta distribution. This, to some extent, 
clarifies the proof of Proposition 1. in Favaro et al. (2009). In fact, despite they do not consider 
mixing explicitly over (S m \K n = k) their proofs work in a multistep procedure that ends up in a 
double conditional mixing, both with a Binomial distribution and a Beta distribution. 



In the next section we apply our approach to the study of the asymptotic properties of K m 
given K n and show how it strongly simplifies the derivation of relevant results. As a by product we 
obtain a new decomposition for the limit law, different from that obtained in Favaro et al. (2009) 
but still a scale mixture of a Beta density and a transformation of the Mittag-Leffler density. For 
implementation of this kind of results in Bayesian nonparametrics in genomic applications, and for 
the need to derive asymptotic distributions connected with derivation of HPD intervals, see Favaro 
et al. (2009). 

3.2 Asymptotics 

We start recalling known results of which we will make use in the following. First a local limit 
law for the number of blocks under the (a, 9) Poisson-Dirichlet model can be found e.g. in Pitman 
(2006). As n -> oo 

F a>e (K n = k) ~g a , e {z)n- a (17) 
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with k ~ zn a , where for z > 



r(g + i) i 

ga,e(^) := ,g ^ "gaW, (18) 
V a / 



and <?«(•) is the Mittag-LefHer density 

g a {z) = aT 1 z- 1 - 1 ' a f a (z- 1 ' a ), (19) 

for /a( - ) the a-stable density with a £ (0, 1). This implies that under P Qj e, (see Th. 8 in Pitman, 
2003) almost surely and in r-th mean 

^r->n/ a (20) 

n a ' 

for fY e/a {z) — g a ,e(z). From again Pitman (2006) we also know that, as n — > oo, for a £ (0, 1) 

E M (JT n )~n° y? fl "t 1} v (21) 
al (6 1 + a) 

and for each r > 

E^TO-n" Wa + r + l)T(ff + 1 ) 



r(0 + ra + l)r(0/a + l) 
It follows that for a PD(a, + ka) model we have 



r(g + fcq + l) 

E M+fea (i^)~ S aT(e + ka + a y (22) 



and for the r-th moment 



„ . . „ ar r(9/a + k + r + l)r(6 + ka + l) 

E a , e+ka (K s ) ~ s T{e + ka + ra + me/a + k + 1 y ( 23 ) 

Adopting our approach to obtain a local limit for the moments of (K m \K n — k) as in Favaro et al. 
(2009, Prop. 2) is just a matter to mix (|23l) over s with a local limit law for S m \K n = k, 



E ai9 (K r m \K n = k) = E ai e +k a(K r s )fs m \K n =k(s)ds. (24) 
Jo 

Notice that, by definition of rising factorials in terms of Gamma function, (x) s = F(x + s)/F(x), 
([TU| may be written as 

_ , _ T(8 + n) T(6 + fcq + s) T{n -ka + m-s) T(m + 1) 

a ' e[ m_S| "~ } ~ T(e + ka)T(n-ka) T(s + 1) T(m-s + l) T(9 + n + m)' 

and by Stirling approximation i.e. T(m + a)/T(m + b) ~ m a ~ b as m — > oo, a local limit law for 
(iSVnl-ftT = fc), for s € (0,to), is given by 



In the next Proposition we obtain the general result for r > 1 from (|23p . The case r = 1 may 
be alternatively derived applying the same operation to 
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Proposition 6. Under the (a, #)-Poisson-Dirichlet model the asymptotic behaviour of the r-th 
moment of {K m \K n = fc) is described by the following approximation 



t ~ i IT* , ,'9 + ka\ 1: + n) 
E a f(K; n \K n = k) ~ I I ^ttH '—m ra 



a J T(0 + n + rot) 



(26) 



Proof. By ([24]) and {25 



E Q , e (^|X„ = fc) 



r(g/ Q + fc + r + i)r(g + fc a + i) f m r(g + n) „ 9+fca+ra _ 1(m „ )n - fca - v - (9+n - 1)i „ ) 



r(fl + fca + ra + l)T(6/a + k + 1) J T(9 + ka)T(n - ka) 
multiplying and dividing by Y(6 + n + ra) it simplifies to 



f e + ka \ T{9 + n) f m T(9 + n + ra) ^+ ka+ra -i, ,-(Q+n-i) rf _ 

^ a y r r(0 + n + ra)i o r(0 + fca + ra)I> - fca) 1 J 



a J Y(9 + n + ra) J T(6 + ka + ra)T(n — ka) 
and by a change of variable, for w = s/m and ds — mdw, 



k 



a 



T{9 + n) 



-to 



T(0 + n + ra) 



a J T(9 + n + ra) J T(9 + ka + ra)T(n — ka) 
and the result follows. 



9+fca+ra-l^ _ ^n-fca-l dw = 

□ 



As for the asymptotic law of K m \K n = fc, first notice that, as to — > oo, we can always write 



m° 



•J in 



# „ = fc 



which can be rewritten as a product of independent random variables as 



TO' 



Krn 
qa 



K n — k. S r , 



K„ 



Now, for the deletion of classes property of the (a, 9) Poisson-Dirichlet model, 



Kn 
qa 



K n — fc; S m — S 



a,(8+ka)/a 



and by (|20[) almost surely and in r-th mean 

Ks 



(9+ka)/a 



whose limit distribution, by an application of (|18[) . for y > is given by 



= g a ,{e+k a ){y) 



T(9 + ka + l) , . 

:V a 9<x{v)- 



T((9 + ka)/a + l) 



(27) 



As for (S m /m\K n = fc), for each to this is the proportion of success in a Beta-Binomial distribu- 
tion of parameters (to, -I- fca, n — fca) to which the same asymptotic properties of the Polya urn 
distribution apply (see e.g. Johnson & Kotz, 1977). It follows that as to — > oo almost surely 



rn 



K„ 



>W ~ Beta(9 + ka, n — ka). 



(28) 
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We are now in a position to state the following 



Proposition 7. Under the (a, 6) Poisson-Dirichlet model (K m /m a \K n = k) converges almost 
surely to a r.v. Z"' k with limit distribution 

1 (o/oe + K)L (n — Ktt)a J z 
for /«(•) the density of the a-stable r.v. for a € (0, 1). 

Proof: By (|2"T| and (|28p the almost sure limit of (K m /m a \K n — k) exists as the product of 
independent r.v.s each admitting an almost sure limit, hence 

(^\Kn = k^Z: : S k =Y (e+ka y a *W°. 

The density of Z"' k is given by 

fz(z) = I f Y (zw- a )w~ a f w (w)dw = 







aT(9/a + k + l) J v ' LV ; 1 w a T(9 + ka)T(n - ka) 

which simplifies to 

= rwaZml^/ '^- 1 - 1 "^ 1 - »>-~/.[(-o-"t*. = 

and by the change of variable zw~ a — v, w — (zv^ 1 ) 1 ^ 01 , dw = a~ 1 z 1 / ol v~ 1 / ol ~ 1 dv, it follows 

= vtai + £r/° £ > ^ / ° +fc " 1 H 1 ~ (^ 1 ) 1/a ) n - te - 1 /«(«- 1/a )^- 1/a - 1 ^. 

1 (0/a + fc)I (ra — ka)a J Q 



□ 



Next Proposition proves both the convergence in r-th mean of (K m /m a \K n = k) to Z"\ and 
that our result, while agrees with Favaro et al. (2009, Proposition 2.) provides a new decomposi- 
tion for the limit law. 

Proposition 8. Let H = Y% * X for Y\ and X independent r.v.s, Y\ ~ g a ,(e+n) an d X ~ 
Beta(8/a + k,n/a — k), then Z"' k and if have the same characteristic function 



r>0 



! \ a J (9 + n) ra 



Proof. First notice that Proposition 3. is enough to say that for m — > oo 



A"' 



771' 



A'„ = k 



+ ka\ T(9 + n) 



a J T(9 + n + ra) 
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Now the density of Z n may be written as 



whose characteristic function by (TT51) and (TT9")) is given by 
r(0 + n)T(6 + ka + 1) 



-— — / exp^zlz 9 /^ 1 / ffQ ( a ) (l - (z/s) 1 /") 
a + 1) a J J z V J 



— ka — l 

dsdz 



T(6 + ka)T{n - ka)T{{9 + ka)/ 
this may be rewritten as 



r((6» + /ca)/a + l)a7 z 3 w J J r(0 + fca)r(n - fea) 

and by a change of variable (z/s) 1 /" = y, z = y a s, dz = say a ~ 1 dy reduces to 



dzds 



T(6 + ka + l) 1 



T((0 + ka)/a + l)a J 9ay "' J " T(9 + ka)T(n - ka) 
and then to 

T{9 + ka+l) f 00 e/ a+ k , , f 1 ity"s r(0 + n) se+ka-i (1 _ y) n-k a -i d , 

~ T((9 + ka)/a + 1) J Q " 9a{S) Jo T{9 + ka)T{n - ka) {V) U V) ^ 

By the characteristic function of Y a for Y ~ Beta{9 + feet, n — fca) we can write 



(s) f e Uyas - T ( + ^ ( yas) e/ a +k-i (1 _ y jn-k a -i say ^i dyds 



r((0 + fca)/a + l)^ r! (9 + n) ra J 9a[S)dS 



= T(9 + ka + l) ^ (it)' (9 + ka) ra [°° Q0/a+k+r 

r 

and by (fTS]) 



_ (j f )r (g + fca ) rQ r(0 + fca + l) r((g + fca + m)/a + 1) 

~ ho H ( 6 ' + n )™ r((0 + fca)/a + l) r(0 + fca + ra + l) ' ^ ^ 

By the usual properties of Gamma function the last expression corresponds to 

(it) 1 * T{9 + ka + ra)T{9 + n) (9 + ka)Y{6 + ka) + r) e+ka + ra 



^ r! r(0 + ka)T(9 + n + ra) »±i»r(^±^) (9 + ka + ra)T{9 + ka + ra) 
which simplifies to 

_ V 2 - (MY [0 + ka\ 1 
~ho~^~ V " ) r (0 + n) ra 
and the conclusion follows by the result in Proposition 2 in Favaro et al. (2009). 
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